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' Educational and social* programs often develop frort weaK ''or .imprecise 
conceptualization 'relating- the progrjjD' s^ sy$teta of- input varial>les. to its 
claiined outcomes. Evaluation personnel can contr*i-bute Bfoth to the fi/tial 



development of* a progrg^i^^nd to the fair 'evaluation o,f such programs by 
learning ^o fcrtjnally characterize programs and to construct causal models . 
of them. The evaluation * effort' represents an attempt to determine the 
""correctn^"^oPthe program*s existing ^conceptualizat ion, and if properly" 
carried out, permits, the developer/sponsor to strengthen, add, 'or* delete 
components which are found to be ijon-functional . In this^paper, the authors 
discus^ the concept of causal model Building .and' illustrate their ideas with 
an example ,of how c?usal model "construction procedures were used to assist . 
in the evS.luatiop of a cojjiplex early- childhood program. . * ' 
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Objectives ' ' * ' ^ ' - • * 

' ' The purpose of this paper is''to ^isctiS3 the concept of causal model 
building as applied to educational and social pxogram evaluation. / This" 
objective will^ be pursued^ through ^.a presentation- of the" concept of causal 
mocfeling as reflected from formal program characterisation procedures, ^ 
review of clirr^nt profflinient conceptions about educational and soci'al pro- 
gram evaluation and their relation to causal modeling, the presentation 6f 
sever^-l exjamples of the use- of causal model building in Evaluation studies, 
and an exam}.nation of the policy im{)UGations fSr the evaluafar. 
" •■' , ' • ■ -.. • ' ^ ■ 

Introduction . . ' > 

! ' " " - . , N . ^ 

•In recent"^ years, the demand- for evaluation- activities has grown as tlj,e 

public, governmental agencies^ and^ocial scientists themselves have began 

to require more accountability from tax-supported sopial programs. Thus, 

initially (afid naively) evaluation was undertaken to provide a simpieT)ver- 

all. assessjnent of a program's .ability to achieVe its specified objectives 

or outcomes. This conception of evaluation 'proved to be limited, however,- 

^as the experience of evaluators revealed that the task was not so simple, 

the evidence often hard to come ^by, and the conclusions not so clear cuf. 

* t, ' ' . 

New dimensions began to be added 'to the scope of an "evaluation," including 

^.concern' about the merits lof the objectives, sensitivity io unanticipated 

effects of programs^^he information needs of different audiences, the dif^ 

ferept evidence requirements ^of j^ormative and sunimative evaluations, and 

the analysis ^f political contexts in which evaluations are^^done. The ade^ 

quacy of "the conceptualization of the "program," the thing tq be evaluated, 

however, has not been treated as an equally important dimension ^^in the eval- 

uation -effort. * * ' ' 



The conceptualization of\ program involves' the 'construction oT a model 
of the important operational ' components of the program and the specification 
of *the expected linkages between program components and between program com-' 
ponents and prrogram outcomes. From this po^int of view, an ideally concep- • 
tuafized program is one which specifies how the system of input variables • 
called "program" operates to effect the outcome variables claimed for it. 
This is the causal model. The program "^it self is* then conceived of as^ merely 
a vehicle for delivering the system of .input variables. The causal model, 
embedded in the program, is a falsifiable .one in the sense/that ^ii evaluator 
can design a tesf of relationships between the systems of 'input and output 

■1 - y ' ■ : 

variable?, and colfipare the empirical observations with'.tlie claims made for 

, / ■• 

the program. This conception, of course, assumes tttat the program' rhett)ric 
accurately reflects^the causal model and ^hat th^ program is prOperly imple- 
thented*, th4t is, all of the claimed causal elements are represented. 'Unfoir- 
tunatedy, most programs vary* widely from the ideal,, and it is rare, in fact, 
to' find a program with* an explicit causal model specifying tn-e relations 
among input components or between input components and qutcc^mes. Given this 
latter situation, the evaj.uator is then faqed with ;«everal model buijding 
tasks which he must' perform before the program can receiye a fair evaluation 
Formative 'evaluatibn activities can play an important role^ in the devel 
opment of such a causal model of the program. A formative evaluation Qah 
help' a program developer to specify' the expected outcomes more -clearly and 
to develop appropriate outcome measures, to conceptualize 'the program, and 
through the use of program tryouts and corrective 'feedback to construct the\^ 
pro-am so that it has a greater Likelihood 0/ actually delivering^the in- 
*tended set of input variables of experiences. 



When these prior functions have-been^-satisfied, the activities^nvolvcd^ 
in summatiV-e program .evaluation are less^diffi'cuit to accompli'sh-, • Most sum-- ' 
mative evalua^tions., However, afCr performed oK programs 'whicK hav? 'had little 

^ ... * 

.formative evaluation^ A school board/ curriculum coiqmi.ttee, or State agency '. 

may decidb- to ^dopt a program on a -trial ba:^i3 because the general description 

' / . • ' . • ^ ^ ' . 

of -^he program' appears to meet a generally local or regional rieed ot bias 

* ■ • , " . ^ . ' V. . . 

"'nice'^ puljlic 'relatidn^ value. An evaluator is.then brought in tp gather a^d 
.> / • ' i( " • • . ' • . , 

V. • • ' . ♦ ■ . ^ - • . . 

suiilmafi^e^ data on the program's^ ability to effect important outcomes. - The 

"avaiuatoi^vis likely to fact imultiple. problems in attempting to do this , eval- . 

uation- jobj ' The most-'ftind'amdntal^ jiroblem stems f^om'the likelihood that in- 

stgad of following a linear development process from a) a theory which, apeci- 

• • , ** ^ ■ ^ ' • - . ' ' . ■ ^ 

' fies thafb)' c0;rts(in program components c') • will regularly lead to^certain 

outcomes; the' program component^ probably developed first. The outcomes 

^ ' . ■ • * ^ * o ' > ' V 

claimed for the iJro^ram'^usually ;folloWed temporally the development of the r • 

aetuai program componenrs, ^nd the. outcomes claimed may not. accurately re; ' 

tflecltowhat operalionaily'goes on in the program or what one could logically 

^oi* empirically eocpect-as an outcome.^ The rhetoric about the program, or a 

set of theoretical conceptions about it, is most oftQn the last to occur, . 

as'the explanatory ajid 'public relations informatipn -brochures are made up. . 

i} * ' - ' , 

Thus .when " th^e is the opportunity for this 'disjunction between the program 

rhetoric, the opera-tional program components, and the claimed outcomes, the 

evaluator must decide whether/ to accept the rhe^Sric or characterization of 

the progiram 'that is. provided by the devolop^rs, or' to develop an alternative 

theoo^etical model b/ised on'his own conception of what the actual components 

of the program are, or to' modijfy or -revise the. program .to "fit some other 

\ ' " . . • ' ' ■ 

causal model 14) be tested. . ' - ^ 



The 'first alternative is likely to' yield a deficient causal model, 
while the second may Lead to a deficient program^, \^ith an apptqpriate causal*' 
model. E^-ther of th^se strategies can. lead to, unmeasured effects, findings 
•of no difference, or both: - The third, alternative will require that the. eval.- 
uator work' with the developer to construct a^more accurate causal model of 
the pro'gram and a more complete ^nd integrated' program^ wijl be the result. 
The payoff will come in a fairer te^ of the p;rogram. 

• '.^ ' * * • ■ 

Types" of Evaluation Studies and Causal Analysis' . • . • 

At least three>type^ of evaluation studi^cah be distinguished in 
\ ' ; ' . ^ - • , - 

which, causal nmodel, analysis migh-d be employed. ^ThesQ are 1) exploratory, 

2) canfirmatory, and 3) optimization evaluation studies . ^ The three typfes 
differ the bais questions they ask about the program or product being 
devaluated; and, they difftrr in the inidex they employ for extjnining' perform^ 
ance, adjudging program or product adequacy, and arriving at ''causal'* in- 
ferences.. ' ^ ' . ' ; . ' .* • 

Exploratory evaluati on studies have as the basic question,, "Is there a 

"^"^ — V '. — ^ 

^program (product)?''. That is,, have the^ developers contributed ^' somethin^" 
that -might be worth^ continuing to examine", to produce or to promote. The 
basic research paradigm for examing this question is reflected in the diagram 



below. 



, Research Paradigm' for 

Exploratory Evaluation 



Performance 



-^^^c^^^ : ^ Acceptability Criteria 

. - ^ Index-Attainment of 



Performance 'Criteria 



1 — V-M -=H 

t^one so{ne all * . ' 

Components.'of Product or Prograiji . -I 



Alternatively we could also do exploratory evaluation by asking questions 



ERIC 



related to, "What kind of perfprmance might I be likely to attain if I used a^ 
given set of" components?" . . V 

The determination of the existence of some kind of "program" or "product" 
is inferred from the relationship of the observed performance to the accept^ 
ability critej'ia under at least two levels o£ implementation ("none" and "some" 
or "all") of the major components of which the program or product is thought 
to consist. The hypothetical data in the figure shows how such performance 
migHt be plotted to demonstrate" a minimal "causal" relationship between the 
program and the performance. This is the kind of paradigm that is most often 
used M:q ^valuate the- effectiveness ^of a program or a product. (Alternatively 
it is also possible to ask, "^'What kind of pejcjformance might I expect to get , 
if I put together a gfVen type of program, based on previously collected data'Ni^ 
and observation?" However, this type of evaluation is less typicaHy employed.) 

Confirmatory evaluation studip^s have a. different basic question underlying 
their use. The issue is not whether there is some ^kind of program or product 
there., but rather, whether the program or j5roduct is some specific identifiable 
subset of program components which are required to generate the program's 

"effects." The basic research paradigm for examining this question is reflected 

// ^ ' 



in the dda^^m below 



Performance 



Acceptability Criteria 



Index-level of attainment of' 
discrete performance* criteria 
with the use of discrete com- 
ponents 



B 



Vayioys Conf iguratio))s 
i)f Components 
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The caudal relationship between discrete perfprmancc criteria and related 
components is inferred from the manner in which variations in the level of im- 
plementation of discrete program^ components influence the levels of discrete . 
performances linketi logically to them. As the hypothetical data' in the figure 
show, a given ^conf iguratipn of components may be tried to- determine if they 
are a necessary and sufficient condition for th^ attainment of certain perform- 
ance criteria. Few evaluation studies have used this paradigm^ although eval- 
uators who regularly collect data on the "degree of implementation" of programs 
are closer to dealing with 'the appropriate levels of variation issues. They 
fall, short of this possibility of ''causal" ^inf^rences because too' often they' 
fail to connect logically the observed variation of discrete effects to dis- 
Crete prog:i;am elements. The latter step is require to have an internally 
consistent program. ' ' ^ ' 

0ptimi2c\tion evaluation is /'a, variation of confirmatory evaluation in 
which the basic question is now, "Can the program or product be improved?" 
The basis research paradigin might employ the use of a response surface de- 



sign as illustrated below. 



arformance 



+ 



+ 



+ 



old substituted augmtinted ^ new 
Configuration of Components 



Acceptability Criteria 



Index-gain over prior 
performance levels with 
other discrete components 
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In this type of evaluation the goal may be either the maintenance of 
performance when substitute components (possibly cheaper or less compli(;ated) 
are employed, or the maximization of performance by revision or augmentation 
of existing components, or by the development of new configurations. The 
"causal'* inferences are derived from the* continuance or increase of the de- 
sired performance level under the substituting or improvement treatment con- 
ditions. 

Causal Modeling 

Much of the wo'rk on causal models in non-experimyital research repre- 
sents an attempt to reason causally from correlational findings. The "cause" 
is inferred from some observed uniformity of relationship between certain ^ 

sets^ of vai'iables, derived f:?om a combination of logical and maj;hematical 

^ ) * ^ 

procedures, and;culnLinates in a set of predictions about what the empirical 

• ' fl. . ^ ■ . ' ' ... 

findi^igs Kould be if we performed the I'experimQnt" (Blalock^and Blalock, 

1968; Wittrock and Wiley, 1970; and Gol^erger, 1972). As most of the tech- 
niques are- used, a model is set forth regarding the hypothetical flow ofv 
causal influence, and the statisjircal procedures used attempt to estimate ^ 
how much ch3;age in a dependent variable would be associated with a certain 
magnitude of change in one or more of the hypothesized causal indicators. 
The flow 'of influence^ may be relatively straight forward, i.e., x-^-y rela- 
tionship, or it may involve mediated influence as in the case where the 
x influences y, and y subsequently influences z. It is well known that dif- 
ficulties may a^rise in such analyses because of error of measurement, un- 
equal precision of measurement, and sample bias, as well as other factors 
(Wiley and Wiley, 1970; Hauser and Goldberger, 19^71 Cochran, 1972; Cronbach . 
and;F¥lYby, 1970; and Wiley, 1973) .' fn addition, where many potential indicators 



are dnvolved it is sometimes necessary for the research to^^e additional 
multivariate techniques t^ composite and reduce the number of variables into 

) - 

a more manageable set (Wiley, 1970), or to create new ps.eucjfo-variates, or to 
perform various kinds of transformations to normalize distributiojis and lin- 
earize relationships, or to apply more complex solutions (Davis, 1973; 
Poirier, 1973) . 

• All of the preceding discussion is to poiitt ou,t that there is a somewhat 
predictable amount of uncertainty in the , inferences derived from causal model 
analysis of non-experimental research data> which is an acceptable or toler- 
able by-product, because such causal analysis usually provides the researcher^ 
witli a more precisely Wplicated model which he ma/ subsequently verify by 

later formal experim^fiCation . The subsequent verification step is an important 

\ 1^ .... 

one, and it is one of the reasonis why these approaches might be referred to 

as exploratory causal analysis, ' *• ^ 

Now, if we assume that a causal modei is essentially a representation of 
reality, then when a given causal model is fitted to a set of data there is a 
test of the veridicality of the particu*lar representation of reality to the 
observed events. In actual practice the researcher strives to gain credibility 
for the particular set of explanatory variables which comprises his causal 
model partly by demonstrating the degree of veridicali,ty of the model, and 
partly by* systematically ruling out other explanatory variables of the same 
events as Tukey (1954), and later others^ have pointed out. One of the more 
acceptable practices is to attempt to rule out -some of these other explanatory 
variables by the use of statistical designs and sampling procedures which min- 
imize "the opportunity for selection bias to occur, or reduce the likelihood 
that variables not specjified or outside^'of our model can systematically in- 
fluence the outcomes of interest, or lead to spurious causal attribution. 

' ■ 



This would reflect the use of "tight'' design features, to help arrive at 

statements 'of eausal it/ within and between program components and program 

■» ■ 

outcomes; ' • \. 

However, suppose on the other hand that we wish to test only one causal 

model, i-^,, ^spme a priori model to which we» have a vested^interest, and fpr ^ 

'-• -■ ' ' ' ' ' ' f 

which all of four .to five years,, of previous instrumentation and technology ,^ 
development v/ork had been allocated, for example, a particular early child'- 
hbod instructional program with multiple curriculum and teacher training com- 
pgnents. - The ''program" could be considered to be a formali^;ed causal fhodel 
with' lots of ^identifiable components and potential linkages as represented 
in the program hardware and software. Such a program if well defiriod and* 
characterized could' be a "strong" causal model in the sense that it;> or. its ^ 
sct^of components, was falsifiable;' it cauld be an "important" mode Ijn the 
sense that i^ or its sets of components, ^ould have direct-policy implica- 
tions fDr practice; and it could be a "general" model to,.the\extent * tha^ it 
had systematically incorporated, within its formal program components, set9 
of variables commonly distinguished within other theoretical models related, 
to the -same outcomes. Put in another \lay, when we have performed all of the 
theoretical extrapolati^s , the basic and applied' research, and completed 
the design and development of ihost of the ^technology for a given educational 
program,, then we are beyond the ^i>5ploratory stage in causal analysis of pro- 
gram sources* of influencJb and .have moved into the .confirmdtory stage. 

In fact, what-wb.litQrally mean by the use of the term, "prbgram," is 

* - 
that we have developed, a system of variables that we have embedded in a set 

of products and materials. We have a "model" of how this system of variables 

- ■ ■ ■ V . 

works, within components and between components, and we believe that each of 
these components' is behaving in a causal fashion as the description of the 



program states, and in a manner consistent with our theoretical notions. 
When we have a "program" structxired in such detail, then we don't need exf 

' perimental manipulation of the kind required following conventional explor- 
atory causal analysis. This is because the structure of the program can bp 
directly mapped into a causal model, reflecting its own pi^ogram-generated 
manipulations, and verified directly. Moreover, we can be categorically 
1-ess concerned about spuriousness when there are a large number of pre- 

'dicted eausal linkages, sijice it is not likely that all such linkages could 
arise accidentally within the ftiass of detailed data available, and compete 
with a tightly constructed theoretical rationale and program logic which re 
lated the det^led elements of the causal model conceptually. Thus, in ef- 
fect, we have substituted "tight" theory for "tight" design to guide our 
causal inferences. » , " 

Moreover, when there exists a structured correspondence between the 
educational program we have developed and the causal model of it, then 
everything that the components of the program achieve czyi be achieved by 
use of the components explicated within the causal model,'' and can be do- 
scribed in ^the lart^age of the theoretical frameworlc we have developed 

^for it; Confirmatory causal analysis seeks to determine whether such a ^ 
structural correspondence exists, and whether the ^;^3sults of the causal 
model can be systematically -obtained from the implementation or use of 
the formalized "prograpi." . . 

'A pplication of Causal Modeling to the Evaluation of a Program 

From 1971 through 1973 a group of ovaluators worked closely with the 
developers Of a i>reschool program'.- The staffs first worked together to , 
anadyze the program, to identify major and minor program components, and 

-11- 



to link these . components to^ expected outcomes.. This analysis revealed^hat 
s^eyevjd' import $;n t ^'pSThpon en ts were either missing or were deficient. The coop- 
^erative venture also reveal ied that the program had no measurement s^^stem-i^or 
tracking a major sef^of input variables "(dealing with teacher's behavior) or 
ajssessing the full range of desired outcomes. Once the program was strength- 

lehed and the .measurement system was created, an; evaluiation was possible oIV^^ 

■ - . » \ \ ■ " . 

ti\is program wi'tH a theoretical framework in ^Ich t^e systemof -components 

cff the^program were both logically and operationally linked to anticipated" - ^ 

=^H3tltcomes by a common language system within\ a strong causal model. * . * 

The evaluation was- conducted during the- 1^72-73 school year in 94 ' ' 

•kindergarten classes in. two cities. Degree of- implem|ntatiofnne^sures were 

collected on the major program input variables' and '.a record of curriculum . ^ 

coverage was kept .on each child.' Pre and post tests were adijiinifitered to 

experimen^ta-1 and control groups on*' a standardized test, and a te^t constructed 

td assess all of the pbjectives of the progfam. The original causal lAodel of 

the program is depicted Figure 1 (next page) . In the model background irar- 

"iajDles ai^e iriclude^d l^ch mi gh"^ have had direct ,^ffects bti progr^^ 

vari^les and but cpmes, and individual^ evel variables are separated from 

class-leveL variables. A two- stage analysis procedure was employed%x)n the 

data collected during the ev'kluation effortt'%!^ Ini^f he first ^tage the effects 

of age, deviated age,, initial level of achi^ment, 'curriculxM coverag^^,. cur- 

■ ^- • ^ • ■ ■ . #^ /, ' ' ' ' ' ' 

■riculum mastery, and post-test achievement were tested by a 'regresfion ,analy- 
/ •■ ' ■ > ^ \, • ^ -4^ ^ 

sis with the latter variables serving -first as criteria for earlier v&ri'ables 

" • ■ ~ ' ' ' ■'^ ^ ' 

and theli as predictors for subsequent variables in the model. * 

In the second analysis^ the effects of variation itj the degree of im-- 
plementati-an of program variables operating at the classroom level were tes^ted 

■ ■ ^ > t ' • ■ • ^ : . . . / " ■ • « 

, usii% residualized jneans foi^initial achievement, .^^erage, mastery; and post- 



achievement for the various classes. These mea; 



s were derived by regressing 
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the other five variables on a given variable and then computing the average 
.residual for a^iven claj^s. The'results of the two analyses are shown in ^ 
FiguTe:^2' (next page) . W^iereyer a program variable ha^ been shown to have 
a strong relationship to a Subsequent program variable or to the imjor de- 
pendent variable, we have presented raw y«gression coefficients with the 
•sta/idard erirbrs far each in parentheses.'- The'results of the a^alys^s are 
especially helpful»in demonstrating the value of the causal modeling -proce- 
dures discussed &rli^r. First, of the el^even* "esse^ffcials" of Ishe^Darc^e 

• . « , ' -. .... ' c • ■ ' ^l'" - ' 

Program .none has been s>iown to have a direct' influence: on post-test achieve- 
m^nfat tjie class level, two of the es^sei^tials are ahown to have an^ strong- 
direct 'influence on durriculum coverage, *and three essentials have a direct 
effect on mastery scores.' Moreover, for the latter all of, the effects on the 
post-test achievement ar^e mediated through the coverage varisable. 

Secondly,' the initial level of afchievemfent has been shown to be a 
strong source of influence^on coverage, mastery, and post-achievemeht . In 
fact, the persuasiveness of the effects of this variable, especially as it 
influences the curriculum coverage scores, highlights a previously unconsid- 
ered issue in educational ^program evaluation. ^ , - 

The degree. of relationship: between initial level of achievement and , 
coverage was a strong positive one, and this means that teachers were pov- 
erittg what children had already indicated they could master 'on the pre-t^st. 
The evaluation issue centers on the consequences to be derived from individ- 
ualization of instruction if the individualization strategy is to wprk with 
skills that children already know. An equally important -consideration is* 
how program evaluation procedures might detect such consequences." , Certainly 
the use of micro-evaluation techniques is neces-sary when program influences 
are likely to be subtle or complex. , , . 



Since it was not the purpose of this presentation to deal with all of 
* the results from the evaluation of this pre-school program, we have not pre- 
. senteb all of the evidence we have collected. The juse of causal modeling j 
' has provided us with 'insight into how programs work and how well they work^ 
In the case of^ this program other evidence clearly indicates that the pro-, 
gram was extremely successful in training teachers to behave in a fashion 
cc^nsistent with th^ obje^ti^es bf training, i.e., to implement the Darcee 
' ^'essentials,"' That these "es^3entials" were minor sources of influence on 
achievem'ent may be less reflective of the plajined part of the program^ thaiji 
it reveals thfe-non-product^ve way in which the curriculum .activities were 
actually used facilitate achievement. ' The opportunity to .focus on this:, 
kind of problem and to correct it is at least one of the alternatives avaii- 
' able and one o£. the benefits such types of causal analyses can lead to. , 
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